This document contains information about how the BigFix system uses ICMP during the process of relay selection and also contains information on how to set these settings for deployments of different sizes. Improperly configured ICMP settings can lead to excessive amounts of ICMP in certain failure situations.
The BES Agents uses the network protocol ICMP during the “relay selection” process when the agent decides which BES Relay (or BES Server) to use as its parent. There are two types of relay selection:
Automatic relay selection is a feature in BigFix that is designed to help the BES Agents find their optimal BES Relays. Autoselection provides the following benefits:
There are two main concerns with using ICMP in an enterprise network, the bandwidth consumed and overloading routers. Bandwidth is a major concern for deployments with large numbers of BES Clients because large numbers of BES Clients simultaneously running relay selection may cause network congestion. The main concern for routers is consuming too much CPU processing ICMP traffic. Routers will typically need to use more CPU processing a TTL of zero. Routers may be CPU constrained before a network link is bandwidth constrained.
--In some large deployments, ICMP traffic sent from agents during autoselection can cause potentially cause network problems (including high router load) in certain rare failure scenarios if the agents are not configured properly.
To mitigate these risks, it is very important to constrain the number of ICMP packets sent by setting configuration settings in the BES Agent. These settings will control how often ICMP packets are sent, how many are sent, and how the agent handles failure situations (like its relay becoming temporarily unavailable).
Manual relay selection will generate negligible amounts of ICMP traffic and is considered to have no risk for generating too many ICMP packets.
+A BES Server failure event can generate a higher then normal amount of ICMP traffic. If the BES Server is down, BES Client posts will begin to fill up in BES Relay FillDB folders and BES Clients will not be able to register with BES Relays. Once BES Relays have reached their FillDB buffer directory maximum capacity they will begin to reject BES Client posts. Finally, once BES Clients reach their Resist Failure Interval they will begin to run automatic relay selection. If the BES Server is down, automatic relay selection will fail until the BES Server is available again unless a failover BES Server is available.
Definitions:
Number of computers
Geographic Distribution
Setting Values:
Name: _BESClient_RelaySelect_MaximumTTLToPing
Default: 255 (Hops)
Description: This value represents the maximum TTL to use in automatic relay selection. The agent sends ICMP packets to relays with increasing TTLs until it reaches this value. For example, if this value is set to 10, then the agent will send ICMP packets with a TTL of 2, 3,4,...,8,9 . In this case, the last ICMP packet sent will have a TTL of 9 (the last TTL sent is 1 less than the MaximumTTLtoPing), which means that the packet will not pass the 9th router (and the “Distance to relay” property will never report more than 8).
Tradeoffs: A higher TTL value will allow the BES Client to find BES Relays that are farther away. At the default of 255, a BES Client would be able to reach any computer in practically any network. Higher MaxTTL values will generate more ICMP traffic during automatic relay selection because the BES Client will send “rounds” of ICMP packets until the maximum TTL is reached. A smaller TTL will generate fewer ICMP packets but BES Clients will only be able to find BES Relays that are closer in terms of network hops. If a BES Client is unable to find a relay at a distance less than the MaxTTL, it will attempt to select its failover relay.
Recommendation: The MaxTTL is one of the primary controls for limiting ICMP. For smaller or centralized networks, the ICMP traffic generated by autoselection can be handled by the network, but at larger more distributed deployments, the volume of ICMP packets grows dramatically in proportion to the number of relays, and much more care needs to be taken.
| Small | Medium | Large | Very Large |
Centralized | 30 | 20 | 20 | 10 |
slightly distributed | 20 | 10 | 10 | 8 |
moderately distributed | 8 | 6 | 3 | 2* |
highly distributed | 6 | 3 | 2* | 2* |
* For the largest most distributed customers, it is recommended that autoselection policies be reviewed with Bigfix support. It may not be possible to use relay autoselection on some portions of the network.
_BESClient_RelaySelect_IntervalSeconds
Default: 21600 (Seconds)
Description: The BES Relay selection algorithm will run periodically as specified by this setting. This allows the agent to find a more optimal relay than its current relay.
Tradeoffs: A smaller relay selection interval will allow BES Clients to find closer BES Relays more frequently. For example, if a new BES Relay is installed, the BES Clients will notice only when they do autoselection. Large values minimize the number of ICMP packets in aggregate, but small values allow for faster times to optimal relay selection.
Recommendations: Large deployments should increase this interval significantly to keep the average amount of ICMP down. Smaller deployments can keep the value lower to make sure that BES Clients maintain relay optimality.
| Small | Medium | Large | Very Large |
centralized | 21600 (6 hours) | 21600 (6 hours) | 86400 (1 day) | 86400 (1 day) |
slightly distributed | 21600 (6 hours) | 43200 (12 hours) | 86400 (1 day) | 129600 (1.5 days) |
moderately distributed | 43200 (12 hours) | 86400 (1 day) | 259200 (3 days) | 259200 (3 days) |
highly distributed | 86400 (1 day) | 259200 (3 days) | 604800 (7 days) | 604800 (7 days) |
Name: _BESClient_RelaySelect_ResistFailureIntervalSeconds
Default: 600 (Seconds)
Description: This value represents the amount of time BES Clients will wait after its relay appears down before performing BES Relay selection. The BES Clients will notice when they send data (post) to the BES Relays that it is no longer accepting posts. If the agent fails twice to post, it will consider the BES Relay to be unavailable. This ResistFailure setting is how long the agent waits until running autoselection once it considers the BES Relay to be unavailable. The interval begins starting at the time of the first failed post.
Tradeoffs: A lower failure interval will allow BES Clients to quickly find alternative BES Relays in the event that a BES Relay is not available. This will give BES Clients a higher connectivity rate when BES Relays are uninstalled or having communication failures. A higher value will allow more resilience if the BES Relays or BES Server is unavailable.
Recommendation: Larger deployments should have higher values to allow more time to recover in the event of a failure before agents run autoselection. Smaller deployments will benefit from a shorter resist failure value (as long as ICMP caused by the BES Server being down is not problematic).
| Small | Medium | Large | Very Large |
centralized | 600 (10 min) | 1800 (30 min) | 3600 (1hours) | 3600 (1hours) |
slightly distributed | 1200 (20 min) | 3600 (1hours) | 3600 (1hours) | 7200 (2 hours) |
moderately distributed | 3600 (1hours) | 7200 (2 hours) | 21600 (6 hours) | 21600 (6 hours) |
highly distributed | 7200 (2 hours) | 14400 (4 hours) | 21600 (6 hours) | 21600 (6 hours) |
Name: _BESClient_RelaySelect_MinRetryIntervalSeconds
Default: 60 (Seconds)
Description: If the automatic relay selection fails (no BES Relays were found), the BES Client will try again after this many seconds. The BES Client will double this value on each successive retry that fails to locate a BES Relay. For relay selection to succeed, the BES Client must be able to find and register with a relay or the main BES Server. (BES Clients will fail to find any BES Relay if the BES Server is unavailable).
Tradeoffs: A lower Minimum Retry Interval will allow the BES Client to run relay selection more often and find BES Relays faster once the failure is fixed. A higher value will generate fewer ICMP packets but make failure recovery slower. For example, if a laptop momentarily loses its network connection and can't find any BES Relays, a lower retry interval will allow it to quickly find a BES Relay once the connection is restored. On the other hand, if the BES Server is down causing BES Clients not to find any BES Relays, it would be best not to retry quickly due to the ICMP traffic.
Recommendation: Larger deployments should have higher values to allow more time between autoselection rounds. Smaller deployments will benefit from a shorter retry values to recover from failures faster.
| Small | Medium | Large | Very Large |
centralized | 600 (10 min) | 1800 (30 min) | 3600 (1hours) | 3600 (1hours) |
slightly distributed | 1200 (20 min) | 3600 (1hours) | 3600 (1hours) | 7200 (2 hours) |
moderately distributed | 3600 (1hours) | 7200 (2 hours) | 21600 (6 hours) | 21600 (6 hours) |
highly distributed | 7200 (2hours) | 14400 (4 hours) | 21600 (6 hours) | 21600 (6 hours) |
Name: _BESClient_RelaySelect_MaxRetryIntervalSeconds
Default: 7200 (Seconds)
Description: After failing to find a BES Relay, the BES Client will continue to try to find a BES Relay. Each time it fails, the BES Client will double the time it spends until this maximum is exceeded. Then the BES Client will try with this maximum retry interval until it successfully selects a BES Relay.
Tradeoffs: A lower Maximum Retry Interval will allow BES Clients to recover from down times faster, while a higher value will force a longer recovery time. A lower value will cause more ICMP traffic since the BES Client runs relay selection more often.
Recommendation: Larger deployments should have higher values to allow more time between autoselection rounds. Smaller deployments will benefit from a shorter retry values to recover from failures faster.
| Small | Medium | Large | Very Large |
centralized | 7200 (2 hours) | 14400 (4 hours) | 28800 (8 hours) | 28800 (8 hours) |
slightly distributed | 7200 (2 hours) | 14400 (4 hours) | 57600 (16 hours) | 86400 (1 day) |
moderately distributed | 14400 (4 hours) | 86400 (1 day) | 129600 (1.5 days) | 129600 (1.5 days) |
highly distributed | 28800 (8 hours) | 129600 (1.5 days) | 172800 (2 days) | 172800 (2 days) |